Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation

نویسندگان

Taiji Suzuki

Masashi Sugiyama

چکیده

The goal of sufficient dimension reduction in supervised learning is to find the low-dimensional subspace of input features that contains all of the information about the output values that the input features possess. In this letter, we propose a novel sufficient dimension-reduction method using a squared-loss variant of mutual information as a dependency measure. We apply a density-ratio estimator for approximating squared-loss mutual information that is formulated as a minimum contrast estimator on parametric or nonparametric models. Since cross-validation is available for choosing an appropriate model, our method does not require any prespecified structure on the underlying distributions. We elucidate the asymptotic bias of our estimator on parametric models and the asymptotic convergence rate on nonparametric models. The convergence analysis utilizes the uniform tail-bound of a U-process, and the convergence rate is characterized by the bracketing entropy of the model. We then develop a natural gradient algorithm on the Grassmann manifold for sufficient subspace search. The analytic formula of our estimator allows us to compute the gradient efficiently. Numerical experiments show that the proposed method compares favorably with existing dimension-reduction approaches on artificial and benchmark data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computationally Efficient Sufficient Dimension Reduction via Squared-Loss Mutual Information

The purpose of sufficient dimension reduction (SDR) is to find a low-dimensional expression of input features that is sufficient for predicting output values. In this paper, we propose a novel distribution-free SDR method called sufficient component analysis (SCA), which is computationally more efficient than existing methods. In our method, a solution is computed by iteratively performing depe...

متن کامل

Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

Capturing input-output dependency is an important task in statistical data analysis. Mutual information (MI) is a vital tool for this purpose, but it is known to be sensitive to outliers. To cope with this problem, a squared-loss variant of MI (SMI) was proposed, and its supervised estimator has been developed. On the other hand, in real-world classification problems, it is conceivable that onl...

متن کامل

Suffcient Component Analysis

متن کامل

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities

Sufficient dimension reduction (SDR) is aimed at obtaining the low-rank projection matrix in the input space such that information about output data is maximally preserved. Among various approaches to SDR, a promising method is based on the eigendecomposition of the outer product of the gradient of the conditional density of output given input. In this letter, we propose a novel estimator of th...

متن کامل

Computationally Efficient Estimation of Squared-Loss Mutual Information with Multiplicative Kernel Models

Squared-loss mutual information (SMI) is a robust measure of the statistical dependence between random variables. The sample-based SMI approximator called least-squares mutual information (LSMI) was demonstrated to be useful in performing various machine learning tasks such as dimension reduction, clustering, and causal inference. The original LSMI approximates the pointwise mutual information ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Neural computation

دوره 25 3 شماره

صفحات -

تاریخ انتشار 2010

Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation

نویسندگان

چکیده

منابع مشابه

Computationally Efficient Sufficient Dimension Reduction via Squared-Loss Mutual Information

Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

Suffcient Component Analysis

Sufficient Dimension Reduction via Direct Estimation of the Gradients of Logarithmic Conditional Densities

Computationally Efficient Estimation of Squared-Loss Mutual Information with Multiplicative Kernel Models

عنوان ژورنال:

اشتراک گذاری